
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="zh_CN">
  <head>
    <meta charset="utf-8" />
    <title>html.parser --- 简单的 HTML 和 XHTML 解析器 &#8212; Python 3.7.8 文档</title>
    <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="../_static/language_data.js"></script>
    <script type="text/javascript" src="../_static/translations.js"></script>
    
    <script type="text/javascript" src="../_static/sidebar.js"></script>
    
    <link rel="search" type="application/opensearchdescription+xml"
          title="在 Python 3.7.8 文档 中搜索"
          href="../_static/opensearch.xml"/>
    <link rel="author" title="关于这些文档" href="../about.html" />
    <link rel="index" title="索引" href="../genindex.html" />
    <link rel="search" title="搜索" href="../search.html" />
    <link rel="copyright" title="版权所有" href="../copyright.html" />
    <link rel="next" title="html.entities --- HTML 一般实体的定义" href="html.entities.html" />
    <link rel="prev" title="html --- 超文本标记语言支持" href="html.html" />
    <link rel="shortcut icon" type="image/png" href="../_static/py.png" />
    <link rel="canonical" href="https://docs.python.org/3/library/html.parser.html" />
    
    <script type="text/javascript" src="../_static/copybutton.js"></script>
    
    
    
    
    <style>
      @media only screen {
        table.full-width-table {
            width: 100%;
        }
      }
    </style>
 

  </head><body>
  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             accesskey="I">索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="html.entities.html" title="html.entities --- HTML 一般实体的定义"
             accesskey="N">下一页</a> |</li>
        <li class="right" >
          <a href="html.html" title="html --- 超文本标记语言支持"
             accesskey="P">上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <a href="../index.html">3.7.8 Documentation</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="markup.html" accesskey="U">结构化标记处理工具</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>    

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="module-html.parser">
<span id="html-parser-simple-html-and-xhtml-parser"></span><h1><a class="reference internal" href="#module-html.parser" title="html.parser: A simple parser that can handle HTML and XHTML."><code class="xref py py-mod docutils literal notranslate"><span class="pre">html.parser</span></code></a> --- 简单的 HTML 和 XHTML 解析器<a class="headerlink" href="#module-html.parser" title="永久链接至标题">¶</a></h1>
<p><strong>源代码：</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/html/parser.py">Lib/html/parser.py</a></p>
<hr class="docutils" id="index-0" />
<p>这个模块定义了一个 <a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 类，为 HTML（超文本标记语言）和 XHTML 文本文件解析提供基础。</p>
<dl class="class">
<dt id="html.parser.HTMLParser">
<em class="property">class </em><code class="sig-prename descclassname">html.parser.</code><code class="sig-name descname">HTMLParser</code><span class="sig-paren">(</span><em class="sig-param">*</em>, <em class="sig-param">convert_charrefs=True</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser" title="永久链接至目标">¶</a></dt>
<dd><p>创建一个能解析无效标记的解析器实例。</p>
<p>如果 <em>convert_charrefs</em> 为 <code class="docutils literal notranslate"><span class="pre">True</span></code> (默认值)，则所有字符引用( <code class="docutils literal notranslate"><span class="pre">script</span></code>/<code class="docutils literal notranslate"><span class="pre">style</span></code>   元素中的除外)都会自动转换为相应的 Unicode 字符。</p>
<p>一个 <a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 类的实例用来接受 HTML 数据，并在标记开始、标记结束、文本、注释和其他元素标记出现的时候调用对应的方法。要实现具体的行为，请使用 <a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 的子类并重载其方法。</p>
<p>这个解析器不检查结束标记是否与开始标记匹配，也不会因外层元素完毕而隐式关闭了的元素引发结束标记处理。</p>
<div class="versionchanged">
<p><span class="versionmodified changed">在 3.4 版更改: </span><em>convert_charrefs</em> 关键字参数被添加。</p>
</div>
<div class="versionchanged">
<p><span class="versionmodified changed">在 3.5 版更改: </span><em>convert_charrefs</em> 参数的默认值现在为 <code class="docutils literal notranslate"><span class="pre">True</span></code>。</p>
</div>
</dd></dl>

<div class="section" id="example-html-parser-application">
<h2>HTML 解析器的示例程序<a class="headerlink" href="#example-html-parser-application" title="永久链接至标题">¶</a></h2>
<p>下面是简单的 HTML 解析器的一个基本示例，使用 <a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 类，当遇到开始标记、结束标记以及数据的时候将内容打印出来。</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">html.parser</span> <span class="kn">import</span> <span class="n">HTMLParser</span>

<span class="k">class</span> <span class="nc">MyHTMLParser</span><span class="p">(</span><span class="n">HTMLParser</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">handle_starttag</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tag</span><span class="p">,</span> <span class="n">attrs</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Encountered a start tag:&quot;</span><span class="p">,</span> <span class="n">tag</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_endtag</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tag</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Encountered an end tag :&quot;</span><span class="p">,</span> <span class="n">tag</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Encountered some data  :&quot;</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>

<span class="n">parser</span> <span class="o">=</span> <span class="n">MyHTMLParser</span><span class="p">()</span>
<span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;html&gt;&lt;head&gt;&lt;title&gt;Test&lt;/title&gt;&lt;/head&gt;&#39;</span>
            <span class="s1">&#39;&lt;body&gt;&lt;h1&gt;Parse me!&lt;/h1&gt;&lt;/body&gt;&lt;/html&gt;&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>输出是:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data  : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: h1
Encountered some data  : Parse me!
Encountered an end tag : h1
Encountered an end tag : body
Encountered an end tag : html
</pre></div>
</div>
</div>
<div class="section" id="htmlparser-methods">
<h2><a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 方法<a class="headerlink" href="#htmlparser-methods" title="永久链接至标题">¶</a></h2>
<p><a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 实例有下列方法：</p>
<dl class="method">
<dt id="html.parser.HTMLParser.feed">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">feed</code><span class="sig-paren">(</span><em class="sig-param">data</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.feed" title="永久链接至目标">¶</a></dt>
<dd><p>填充一些文本到解析器中。如果包含完整的元素，则被处理；如果数据不完整，将被缓冲直到更多的数据被填充，或者 <a class="reference internal" href="#html.parser.HTMLParser.close" title="html.parser.HTMLParser.close"><code class="xref py py-meth docutils literal notranslate"><span class="pre">close()</span></code></a> 被调用。<em>data</em> 必须为 <a class="reference internal" href="stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a> 类型。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.close">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">close</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.close" title="永久链接至目标">¶</a></dt>
<dd><p>如同后面跟着一个文件结束标记一样，强制处理所有缓冲数据。这个方法能被派生类重新定义，用于在输入的末尾定义附加处理，但是重定义的版本应当始终调用基类 <a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 的 <a class="reference internal" href="#html.parser.HTMLParser.close" title="html.parser.HTMLParser.close"><code class="xref py py-meth docutils literal notranslate"><span class="pre">close()</span></code></a> 方法。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.reset">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">reset</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.reset" title="永久链接至目标">¶</a></dt>
<dd><p>重置实例。丢失所有未处理的数据。在实例化阶段被隐式调用。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.getpos">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">getpos</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.getpos" title="永久链接至目标">¶</a></dt>
<dd><p>返回当前行号和偏移值。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.get_starttag_text">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">get_starttag_text</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.get_starttag_text" title="永久链接至目标">¶</a></dt>
<dd><p>返回最近打开的开始标记中的文本。 结构化处理时通常应该不需要这个，但在处理“已部署”的 HTML 或是在以最小改变来重新生成输入时可能会有用处（例如可以保留属性间的空格等）。</p>
</dd></dl>

<p>下列方法将在遇到数据或者标记元素的时候被调用。他们需要在子类中重载。基类的实现中没有任何实际操作（除了 <a class="reference internal" href="#html.parser.HTMLParser.handle_startendtag" title="html.parser.HTMLParser.handle_startendtag"><code class="xref py py-meth docutils literal notranslate"><span class="pre">handle_startendtag()</span></code></a> ）：</p>
<dl class="method">
<dt id="html.parser.HTMLParser.handle_starttag">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_starttag</code><span class="sig-paren">(</span><em class="sig-param">tag</em>, <em class="sig-param">attrs</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_starttag" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法在标签开始的时候被调用（例如： <code class="docutils literal notranslate"><span class="pre">&lt;div</span> <span class="pre">id=&quot;main&quot;&gt;</span></code> ）。</p>
<p><em>tag</em> 参数是小写的标记名。<em>attrs</em> 参数是一个 <code class="docutils literal notranslate"><span class="pre">(name,</span> <span class="pre">value)</span></code> 形式的列表，包含了所有在标记的 <code class="docutils literal notranslate"><span class="pre">&lt;&gt;</span></code>  括号中找到的属性。<em>name</em> 转换为小写，<em>value</em> 的引号被去除，字符和实体引用都会被替换。</p>
<p>实例中，对于标签 <code class="docutils literal notranslate"><span class="pre">&lt;A</span> <span class="pre">HREF=&quot;https://www.cwi.nl/&quot;&gt;</span></code>，这个方法将以下列形式被调用 <code class="docutils literal notranslate"><span class="pre">handle_starttag('a',</span> <span class="pre">[('href',</span> <span class="pre">'https://www.cwi.nl/')])</span></code> 。</p>
<p><a class="reference internal" href="html.entities.html#module-html.entities" title="html.entities: Definitions of HTML general entities."><code class="xref py py-mod docutils literal notranslate"><span class="pre">html.entities</span></code></a> 中的所有实体引用，会被替换为属性值。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_endtag">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_endtag</code><span class="sig-paren">(</span><em class="sig-param">tag</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_endtag" title="永久链接至目标">¶</a></dt>
<dd><p>此方法被用来处理元素的结束标记（例如： <code class="docutils literal notranslate"><span class="pre">&lt;/div&gt;</span></code> ）。</p>
<p><em>tag</em> 参数是小写的标签名。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_startendtag">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_startendtag</code><span class="sig-paren">(</span><em class="sig-param">tag</em>, <em class="sig-param">attrs</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_startendtag" title="永久链接至目标">¶</a></dt>
<dd><p>类似于 <a class="reference internal" href="#html.parser.HTMLParser.handle_starttag" title="html.parser.HTMLParser.handle_starttag"><code class="xref py py-meth docutils literal notranslate"><span class="pre">handle_starttag()</span></code></a>, 只是在解析器遇到 XHTML 样式的空标记时被调用（ <code class="docutils literal notranslate"><span class="pre">&lt;img</span> <span class="pre">...</span> <span class="pre">/&gt;</span></code>）。这个方法能被需要这种特殊词法信息的子类重载；默认实现仅简单调用 <a class="reference internal" href="#html.parser.HTMLParser.handle_starttag" title="html.parser.HTMLParser.handle_starttag"><code class="xref py py-meth docutils literal notranslate"><span class="pre">handle_starttag()</span></code></a> 和 <a class="reference internal" href="#html.parser.HTMLParser.handle_endtag" title="html.parser.HTMLParser.handle_endtag"><code class="xref py py-meth docutils literal notranslate"><span class="pre">handle_endtag()</span></code></a> 。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_data">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_data</code><span class="sig-paren">(</span><em class="sig-param">data</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_data" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法被用来处理任意数据（例如：文本节点和 <code class="docutils literal notranslate"><span class="pre">&lt;script&gt;...&lt;/script&gt;</span></code> 以及 <code class="docutils literal notranslate"><span class="pre">&lt;style&gt;...&lt;/style&gt;</span></code> 中的内容）。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_entityref">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_entityref</code><span class="sig-paren">(</span><em class="sig-param">name</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_entityref" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法被用于处理 <code class="docutils literal notranslate"><span class="pre">&amp;name;</span></code> 形式的命名字符引用（例如 <code class="docutils literal notranslate"><span class="pre">&amp;gt;</span></code>），其中 <em>name</em> 是通用的实体引用（例如： <code class="docutils literal notranslate"><span class="pre">'gt'</span></code>）。如果 <em>convert_charrefs</em> 为 <code class="docutils literal notranslate"><span class="pre">True</span></code>，该方法永远不会被调用。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_charref">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_charref</code><span class="sig-paren">(</span><em class="sig-param">name</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_charref" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法被用来处理 <code class="docutils literal notranslate"><span class="pre">&amp;#NNN;</span></code> 和 <code class="docutils literal notranslate"><span class="pre">&amp;#xNNN;</span></code> 形式的十进制和十六进制字符引用。例如，<code class="docutils literal notranslate"><span class="pre">&amp;gt;</span></code> 等效的十进制形式为 <code class="docutils literal notranslate"><span class="pre">&amp;#62;</span></code> ，而十六进制形式为 <code class="docutils literal notranslate"><span class="pre">&amp;#x3E;</span></code> ；在这种情况下，方法将收到  <code class="docutils literal notranslate"><span class="pre">'62'</span></code> 或 <code class="docutils literal notranslate"><span class="pre">'x3E'</span></code> 。如果 <em>convert_charrefs</em> 为 <code class="docutils literal notranslate"><span class="pre">True</span></code> ，则该方法永远不会被调用。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_comment">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_comment</code><span class="sig-paren">(</span><em class="sig-param">data</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_comment" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法在遇到注释的时候被调用（例如： <code class="docutils literal notranslate"><span class="pre">&lt;!--comment--&gt;</span></code> ）。</p>
<p>例如， <code class="docutils literal notranslate"><span class="pre">&lt;!--</span> <span class="pre">comment</span> <span class="pre">--&gt;</span></code> 这个注释会用 <code class="docutils literal notranslate"><span class="pre">'</span> <span class="pre">comment</span> <span class="pre">'</span></code> 作为参数调用此方法。</p>
<p>Internet Explorer 条件注释（condcoms）的内容也被发送到这个方法，因此，对于 <code class="docutils literal notranslate"><span class="pre">&lt;!--[if</span> <span class="pre">IE</span> <span class="pre">9]&gt;IE9-specific</span> <span class="pre">content&lt;![endif]--&gt;</span></code> ，这个方法将接收到 <code class="docutils literal notranslate"><span class="pre">'[if</span> <span class="pre">IE</span> <span class="pre">9]&gt;IE9-specific</span> <span class="pre">content&lt;![endif]'</span></code> 。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_decl">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_decl</code><span class="sig-paren">(</span><em class="sig-param">decl</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_decl" title="永久链接至目标">¶</a></dt>
<dd><p>这个方法用来处理 HTML doctype 申明（例如 <code class="docutils literal notranslate"><span class="pre">&lt;!DOCTYPE</span> <span class="pre">html&gt;</span></code> ）。</p>
<p><em>decl</em> 形参为 <code class="docutils literal notranslate"><span class="pre">&lt;!...&gt;</span></code> 标记中的所有内容（例如： <code class="docutils literal notranslate"><span class="pre">'DOCTYPE</span> <span class="pre">html'</span></code> ）。</p>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.handle_pi">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">handle_pi</code><span class="sig-paren">(</span><em class="sig-param">data</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.handle_pi" title="永久链接至目标">¶</a></dt>
<dd><p>此方法在遇到处理指令的时候被调用。<em>data</em> 形参将包含整个处理指令。例如，对于处理指令 <code class="docutils literal notranslate"><span class="pre">&lt;?proc</span> <span class="pre">color='red'&gt;</span></code> ，这个方法将以 <code class="docutils literal notranslate"><span class="pre">handle_pi(&quot;proc</span> <span class="pre">color='red'&quot;)</span></code> 形式被调用。它旨在被派生类重载；基类实现中无任何实际操作。</p>
<div class="admonition note">
<p class="admonition-title">注解</p>
<p><a class="reference internal" href="#html.parser.HTMLParser" title="html.parser.HTMLParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code></a> 类使用 SGML 语法规则处理指令。使用 <code class="docutils literal notranslate"><span class="pre">'?'</span></code> 结尾的 XHTML 处理指令将导致 <code class="docutils literal notranslate"><span class="pre">'?'</span></code> 包含在 <em>data</em> 中。</p>
</div>
</dd></dl>

<dl class="method">
<dt id="html.parser.HTMLParser.unknown_decl">
<code class="sig-prename descclassname">HTMLParser.</code><code class="sig-name descname">unknown_decl</code><span class="sig-paren">(</span><em class="sig-param">data</em><span class="sig-paren">)</span><a class="headerlink" href="#html.parser.HTMLParser.unknown_decl" title="永久链接至目标">¶</a></dt>
<dd><p>当解析器读到无法识别的声明时，此方法被调用。</p>
<p><em>data</em> 形参为 <code class="docutils literal notranslate"><span class="pre">&lt;![...]&gt;</span></code> 标记中的所有内容。某些时候对派生类的重载很有用。基类实现中无任何实际操作。</p>
</dd></dl>

</div>
<div class="section" id="examples">
<span id="htmlparser-examples"></span><h2>例子<a class="headerlink" href="#examples" title="永久链接至标题">¶</a></h2>
<p>下面的类实现了一个解析器，用于更多示例的演示:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">html.parser</span> <span class="kn">import</span> <span class="n">HTMLParser</span>
<span class="kn">from</span> <span class="nn">html.entities</span> <span class="kn">import</span> <span class="n">name2codepoint</span>

<span class="k">class</span> <span class="nc">MyHTMLParser</span><span class="p">(</span><span class="n">HTMLParser</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">handle_starttag</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tag</span><span class="p">,</span> <span class="n">attrs</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Start tag:&quot;</span><span class="p">,</span> <span class="n">tag</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">attr</span> <span class="ow">in</span> <span class="n">attrs</span><span class="p">:</span>
            <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;     attr:&quot;</span><span class="p">,</span> <span class="n">attr</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_endtag</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tag</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;End tag  :&quot;</span><span class="p">,</span> <span class="n">tag</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Data     :&quot;</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_comment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Comment  :&quot;</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_entityref</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
        <span class="n">c</span> <span class="o">=</span> <span class="nb">chr</span><span class="p">(</span><span class="n">name2codepoint</span><span class="p">[</span><span class="n">name</span><span class="p">])</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Named ent:&quot;</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_charref</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">name</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">&#39;x&#39;</span><span class="p">):</span>
            <span class="n">c</span> <span class="o">=</span> <span class="nb">chr</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">name</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="mi">16</span><span class="p">))</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">c</span> <span class="o">=</span> <span class="nb">chr</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Num ent  :&quot;</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">handle_decl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Decl     :&quot;</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>

<span class="n">parser</span> <span class="o">=</span> <span class="n">MyHTMLParser</span><span class="p">()</span>
</pre></div>
</div>
<p>解析一个文档类型声明:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01//EN&quot; &#39;</span>
<span class="gp">... </span>            <span class="s1">&#39;&quot;http://www.w3.org/TR/html4/strict.dtd&quot;&gt;&#39;</span><span class="p">)</span>
<span class="go">Decl     : DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01//EN&quot; &quot;http://www.w3.org/TR/html4/strict.dtd&quot;</span>
</pre></div>
</div>
<p>解析一个具有一些属性和标题的元素:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;img src=&quot;python-logo.png&quot; alt=&quot;The Python logo&quot;&gt;&#39;</span><span class="p">)</span>
<span class="go">Start tag: img</span>
<span class="go">     attr: (&#39;src&#39;, &#39;python-logo.png&#39;)</span>
<span class="go">     attr: (&#39;alt&#39;, &#39;The Python logo&#39;)</span>
<span class="go">&gt;&gt;&gt;</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;h1&gt;Python&lt;/h1&gt;&#39;</span><span class="p">)</span>
<span class="go">Start tag: h1</span>
<span class="go">Data     : Python</span>
<span class="go">End tag  : h1</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">script</span></code> 和 <code class="docutils literal notranslate"><span class="pre">style</span></code> 元素中的内容原样返回，无需进一步解析:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;style type=&quot;text/css&quot;&gt;#python { color: green }&lt;/style&gt;&#39;</span><span class="p">)</span>
<span class="go">Start tag: style</span>
<span class="go">     attr: (&#39;type&#39;, &#39;text/css&#39;)</span>
<span class="go">Data     : #python { color: green }</span>
<span class="go">End tag  : style</span>

<span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;script type=&quot;text/javascript&quot;&gt;&#39;</span>
<span class="gp">... </span>            <span class="s1">&#39;alert(&quot;&lt;strong&gt;hello!&lt;/strong&gt;&quot;);&lt;/script&gt;&#39;</span><span class="p">)</span>
<span class="go">Start tag: script</span>
<span class="go">     attr: (&#39;type&#39;, &#39;text/javascript&#39;)</span>
<span class="go">Data     : alert(&quot;&lt;strong&gt;hello!&lt;/strong&gt;&quot;);</span>
<span class="go">End tag  : script</span>
</pre></div>
</div>
<p>解析注释:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;!-- a comment --&gt;&#39;</span>
<span class="gp">... </span>            <span class="s1">&#39;&lt;!--[if IE 9]&gt;IE-specific content&lt;![endif]--&gt;&#39;</span><span class="p">)</span>
<span class="go">Comment  :  a comment</span>
<span class="go">Comment  : [if IE 9]&gt;IE-specific content&lt;![endif]</span>
</pre></div>
</div>
<p>解析命名或数字形式的字符引用，并把他们转换到正确的字符（注意：这 3 种转义都是 <code class="docutils literal notranslate"><span class="pre">'&gt;'</span></code> ）:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&amp;gt;&amp;#62;&amp;#x3E;&#39;</span><span class="p">)</span>
<span class="go">Named ent: &gt;</span>
<span class="go">Num ent  : &gt;</span>
<span class="go">Num ent  : &gt;</span>
</pre></div>
</div>
<p>填充不完整的块给 <a class="reference internal" href="#html.parser.HTMLParser.feed" title="html.parser.HTMLParser.feed"><code class="xref py py-meth docutils literal notranslate"><span class="pre">feed()</span></code></a> 执行，<a class="reference internal" href="#html.parser.HTMLParser.handle_data" title="html.parser.HTMLParser.handle_data"><code class="xref py py-meth docutils literal notranslate"><span class="pre">handle_data()</span></code></a> 可能会多次调用（除非 <em>convert_charrefs</em> 被设置为 <code class="docutils literal notranslate"><span class="pre">True</span></code> ）:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;&lt;sp&#39;</span><span class="p">,</span> <span class="s1">&#39;an&gt;buff&#39;</span><span class="p">,</span> <span class="s1">&#39;ered &#39;</span><span class="p">,</span> <span class="s1">&#39;text&lt;/s&#39;</span><span class="p">,</span> <span class="s1">&#39;pan&gt;&#39;</span><span class="p">]:</span>
<span class="gp">... </span>    <span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
<span class="gp">...</span>
<span class="go">Start tag: span</span>
<span class="go">Data     : buff</span>
<span class="go">Data     : ered</span>
<span class="go">Data     : text</span>
<span class="go">End tag  : span</span>
</pre></div>
</div>
<p>解析无效的 HTML (例如：未引用的属性）也能正常运行:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">parser</span><span class="o">.</span><span class="n">feed</span><span class="p">(</span><span class="s1">&#39;&lt;p&gt;&lt;a class=link href=#main&gt;tag soup&lt;/p &gt;&lt;/a&gt;&#39;</span><span class="p">)</span>
<span class="go">Start tag: p</span>
<span class="go">Start tag: a</span>
<span class="go">     attr: (&#39;class&#39;, &#39;link&#39;)</span>
<span class="go">     attr: (&#39;href&#39;, &#39;#main&#39;)</span>
<span class="go">Data     : tag soup</span>
<span class="go">End tag  : p</span>
<span class="go">End tag  : a</span>
</pre></div>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../contents.html">目录</a></h3>
  <ul>
<li><a class="reference internal" href="#"><code class="xref py py-mod docutils literal notranslate"><span class="pre">html.parser</span></code> --- 简单的 HTML 和 XHTML 解析器</a><ul>
<li><a class="reference internal" href="#example-html-parser-application">HTML 解析器的示例程序</a></li>
<li><a class="reference internal" href="#htmlparser-methods"><code class="xref py py-class docutils literal notranslate"><span class="pre">HTMLParser</span></code> 方法</a></li>
<li><a class="reference internal" href="#examples">例子</a></li>
</ul>
</li>
</ul>

  <h4>上一个主题</h4>
  <p class="topless"><a href="html.html"
                        title="上一章"><code class="xref py py-mod docutils literal notranslate"><span class="pre">html</span></code> --- 超文本标记语言支持</a></p>
  <h4>下一个主题</h4>
  <p class="topless"><a href="html.entities.html"
                        title="下一章"><code class="xref py py-mod docutils literal notranslate"><span class="pre">html.entities</span></code> --- HTML 一般实体的定义</a></p>
  <div role="note" aria-label="source link">
    <h3>本页</h3>
    <ul class="this-page-menu">
      <li><a href="../bugs.html">提交 Bug</a></li>
      <li>
        <a href="https://github.com/python/cpython/blob/3.7/Doc/library/html.parser.rst"
            rel="nofollow">显示源代码
        </a>
      </li>
    </ul>
  </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             >索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="html.entities.html" title="html.entities --- HTML 一般实体的定义"
             >下一页</a> |</li>
        <li class="right" >
          <a href="html.html" title="html --- 超文本标记语言支持"
             >上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <a href="../index.html">3.7.8 Documentation</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="markup.html" >结构化标记处理工具</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>  
    <div class="footer">
    &copy; <a href="../copyright.html">版权所有</a> 2001-2020, Python Software Foundation.
    <br />
    Python 软件基金会是一个非盈利组织。
    <a href="https://www.python.org/psf/donations/">请捐助。</a>
    <br />
    最后更新于 6月 29, 2020.
    <a href="../bugs.html">发现了问题</a>？
    <br />
    使用<a href="http://sphinx.pocoo.org/">Sphinx</a>2.3.1 创建。
    </div>

  </body>
</html>